Skip to content

Conversation

@3l1
Copy link
Contributor

@3l1 3l1 commented Apr 17, 2025

Summary: Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758

@3l1 3l1 requested a review from digantdesai as a code owner April 17, 2025 21:13
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10279

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 758e29c with merge base 3997ae9 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

@3l1
Copy link
Contributor Author

3l1 commented Apr 17, 2025

@pytorchbot label "topic: not user facing"

@3l1 3l1 force-pushed the export-D73212758 branch from 4d1f3b0 to cccb1e2 Compare April 21, 2025 02:44
3l1 added a commit to 3l1/executorch that referenced this pull request Apr 21, 2025
Summary:

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request Apr 21, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from cccb1e2 to 87434e3 Compare April 21, 2025 02:48
macs: int = 128,
system_config: str = "Ethos_U55_High_End_Embedded",
memory_mode: str = "Shared_Sram",
memory_mode: str = "Sram_Only",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale for changing the default? I don't see any issues, asking because I am not sure why this was the default before.

cc @freddan80, @zingo

Copy link
Collaborator

@gggekov gggekov May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.

Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.

The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.

@3l1 maybe we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode we test against?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

@3l1 3l1 force-pushed the export-D73212758 branch from 87434e3 to a2fae41 Compare May 1, 2025 23:42
3l1 added a commit to 3l1/executorch that referenced this pull request May 1, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from a2fae41 to 836ea7e Compare May 2, 2025 02:29
@zingo zingo requested a review from gggekov May 2, 2025 07:20
@zingo zingo added the module: arm Issues related to arm backend label May 2, 2025
Copy link
Collaborator

@gggekov gggekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.

Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.

The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.

@3l1 perhaps we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode?

macs: int = 128,
system_config: str = "Ethos_U55_High_End_Embedded",
memory_mode: str = "Shared_Sram",
memory_mode: str = "Sram_Only",
Copy link
Collaborator

@gggekov gggekov May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.

Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.

The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.

@3l1 maybe we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode we test against?

@3l1
Copy link
Contributor Author

3l1 commented May 2, 2025

You can ignore this pull request :) (Im not sure how to cancel it in this UI)

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from 836ea7e to b1c3398 Compare May 2, 2025 18:51
3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from b1c3398 to befb9ff Compare May 2, 2025 19:54
3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from befb9ff to 18c26f4 Compare May 2, 2025 19:56
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from 18c26f4 to 9f67a99 Compare May 2, 2025 19:58
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from 9f67a99 to 847f57e Compare May 2, 2025 20:04
3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from 847f57e to 873f99e Compare May 2, 2025 23:39
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

@3l1 3l1 force-pushed the export-D73212758 branch from 873f99e to 758e29c Compare May 2, 2025 23:42
@zingo zingo marked this pull request as draft May 6, 2025 10:18
@github-actions
Copy link

github-actions bot commented Sep 1, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the stale PRs inactive for over 60 days label Sep 1, 2025
@gggekov gggekov closed this Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported module: arm Issues related to arm backend stale PRs inactive for over 60 days topic: not user facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants